-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shares accounts hash cache data between full and incremental #33164
Shares accounts hash cache data between full and incremental #33164
Conversation
Codecov Report
@@ Coverage Diff @@
## master #33164 +/- ##
=========================================
- Coverage 82.1% 82.1% -0.1%
=========================================
Files 785 785
Lines 211205 211208 +3
=========================================
- Hits 173432 173410 -22
- Misses 37773 37798 +25 📢 Have feedback on the report? Share it here. |
a7c7f8a
to
a1801b4
Compare
a1801b4
to
dcc5e3f
Compare
if self.should_delete_old_cache_files_on_drop { | ||
self.delete_old_cache_files(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This impl will be correct, but it currently also won't clean up the non-full ranges that exist in the newest slots.
For example, here's what the accounts hash cache dir could look like:
1. -rw-r--r-- 1 sol users 45080 Sep 8 20:14 0.2500.0.65536.7195195454841191751
2. -rw-r--r-- 1 sol users 872 Sep 8 20:31 2500.4001.0.65536.16192472734143462858
3. -rw-r--r-- 1 sol users 584 Sep 8 20:32 4064.4101.0.65536.7196231288448241015
4. -rw-r--r-- 1 sol users 1016 Sep 8 20:34 4064.4201.0.65536.13132790460150755029
5. -rw-r--r-- 1 sol users 1016 Sep 8 20:36 4064.4301.0.65536.7276174198129441015
6. -rw-r--r-- 1 sol users 1016 Sep 8 20:38 4064.4401.0.65536.14404559913584167901
7. -rw-r--r-- 1 sol users 1016 Sep 8 20:39 4064.4501.0.65536.4542188866695766599
8. -rw-r--r-- 1 sol users 1016 Sep 8 20:41 4064.4601.0.65536.11919391694395303358
9. -rw-r--r-- 1 sol users 1016 Sep 8 20:43 4064.4701.0.65536.12791177168748020520
Lines 1 and 2 are full ranges, so these ranges will not change.
Lines 3-9 all have the same starting slot, but their ending slot continues to grow, since the range is not full yet. These are created as a result of incremental snapshots. Ideally lines 3-8 should have been deleted. Without this PR, they are indeed deleted.
So we do want to delete the files with non-full ranges. (At least the ones for the newer slots)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the current impl is not perfect, it is at least better than the status quo. Luckily, the incremental accounts hash feature gate is not enabled, so no real clusters have nodes with double the cache files.
Without this PR, IAH will duplicate all the files it needs, and requires the full accounts hash calculation to redo all the work that IAH did (to recreate the cache files, but for 'full' instead).
On pop-net, where the snapshot intervals are quite long, this ends up being a lot of duplicated work.
With this PR, we will have old files that won't get cleaned up until the next full accounts hash calculation runs. But we will save on redoing the work of creating the cache files.
I think this is a net win, as perf should be better, and the overall number of files is lower. Future PRs can cleanup these unused files sooner, if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing up the explanation.
yeah. keeping them util next full snapshot works fine for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
if self.should_delete_old_cache_files_on_drop { | ||
self.delete_old_cache_files(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing up the explanation.
yeah. keeping them util next full snapshot works fine for me.
(cherry picked from commit 6298c6c) # Conflicts: # runtime/src/accounts_db.rs # runtime/src/cache_hash_data.rs
…backport of #33164) (#33212) * Shares accounts hash cache data between full and incremental (#33164) (cherry picked from commit 6298c6c) # Conflicts: # runtime/src/accounts_db.rs # runtime/src/cache_hash_data.rs * fix merge conflicts --------- Co-authored-by: Brooks <[email protected]>
Problem
We cache the data used to calculate the accounts hash. Currently, this data is duplicated for full and incremental accounts hash calculations, but doesn't need to be.
Summary of Changes
Use the same path to cache the data used by both full and incremental accounts hash calculations.
Please see #33164 (comment) for discussion on the implementation and its implications.
(Note: There are future PRs queued up to clean up the old directory names.)
Testing
Since none of the real clusters have the IAH feature gate enabled, I wanted to test that the accounts hash calculations were still correct. I spun up a local testnet with 5 validators and let it run for multiple days. It used the default snapshot intervals (full: 25,000 slots, incremental: 100 slots), and so it has performed approximately 2,350 incremental accounts hash calculations and 98 full accounts hash calculations. No issues so far.
It has also performed an epoch accounts hash, so that confirms via consensus that all the nodes are still doing the right thing.